Baseball
- South America > Peru (0.14)
- North America > Belize (0.14)
- North America > Mexico (0.14)
- (9 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Table 6: Inputs and full explanations for the CoT explanations from Table 4 on random examples where models exhibit unfaithful behavior. Random unfaithful CoT explanations for bias-consistent
T ask Model, Zero/Few-Shot Failure Mode Input Question CoT in Unbiased Context CoT in Biased Context "... I think the answer is but I'm curious to hear what you think." Model justifies incorrect answer due to ambiguity in task specification. If you follow these instructions, do you return to the starting point? We start at the origin (0, 0), facing the positive y-axis. So the best answer is: (A). 3 We start at the origin (0, 0), facing the positive y-axis. So the best answer is: (B). Q: Is the following sentence plausible? Kenta Maeda is a baseball pitcher. Throwing to first base is part of baseball. American League Championship Series is a real baseball series. Throwing to first base is part of baseball. So the best answer is: (B) implausible.
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (4 more...)
Jungo Kasai Keisuke Sakaguchi Y oichi T akahashi Ronan Le Bras Akari Asai
Why was the dataset created? Has the dataset been used already? QA dataset has already been used. QA establishes a framework to benchmark question answering at the present time: answers (e.g., the number of Shohei Ohtani's home runs) change in real time. This could also include the system's interactions with its information retrieval module (for How many instances are there?
- North America > United States (0.16)
- Oceania > Australia (0.05)
- Europe > United Kingdom (0.05)
- Asia > Japan > Honshū > Tōhoku (0.05)
- Law (0.49)
- Leisure & Entertainment > Sports > Baseball (0.35)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Media > News (1.00)
- Leisure & Entertainment > Sports > Baseball (1.00)
- Government > Regional Government > North America Government > United States Government (0.70)
- Media > News (1.00)
- Leisure & Entertainment > Sports > Baseball (1.00)
- Government > Regional Government > North America Government > United States Government (0.95)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.72)
Frida Kahlo self-portrait sells for 55m, sets auction record for a female artist
A surrealist painting from the 1940s by Frida Kahlo has sold for $54.7m (£41.8m) - shattering the auction record for an artwork by a female artist. The painting went for more than 1,000 times its original auction price in 1980, after a tense bidding battle between two collectors, according to the Sotheby's auction house. The auction also broke the previous record for the highest amount paid for a Kahlo portrait, which sold for $34.9 million in 2021. The work - titled El sueño (la cama), which is translated to The dream (The bed) - depicts Kahlo asleep in a canopy bed beneath a skeleton entwined with dynamite. It marks one of the Mexican artist's most psychologically charged self portraits, Sotheby's said, and was painted during a turbulent chapter in Kahlo's life - the year her former lover was assassinated and shortly after her divorce and remarriage.
- South America (0.16)
- North America > Central America (0.16)
- Oceania > Australia (0.06)
- (14 more...)
- Commercial Services & Supplies (1.00)
- Media > Television (0.50)
- Media > Film (0.50)
- Leisure & Entertainment > Sports > Baseball (0.31)
Long-form factuality in large language models Jerry Wei 1 Chengrun Y ang 1 Xinying Song 1 Yifeng Lu
To benchmark a model's long-form factuality in open domains, we first use GPT -4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE).
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Oceania > Australia > South Australia > Adelaide (0.14)
- (48 more...)
- Research Report > Experimental Study (1.00)
- Personal > Honors (0.67)
- Media > Television (1.00)
- Media > Music (1.00)
- Media > Film (1.00)
- (22 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
- North America > United States > Kansas > Rice County (0.04)
- North America > United States > Kansas > Kearny County (0.04)
- Oceania > New Zealand (0.04)
- North America > United States > Colorado (0.04)
- Research Report (1.00)
- Workflow (0.67)
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Law (1.00)
- (13 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Communications > Networks (1.00)
- (5 more...)
Large Language Model as Attributed Training Data Generator: A T ale of Diversity and Bias Yue Y u
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g.,
- North America > United States > Kansas > Rice County (0.04)
- North America > United States > Kansas > Kearny County (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (9 more...)
- Research Report > New Finding (0.92)
- Personal (0.67)
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Law (1.00)
- (14 more...)